<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
  PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1">
  <title id="dq20941">EXPLAIN</title>
  <body>
    <p id="sql_command_desc">Shows the query plan of a statement.</p>
    <section id="section2">
      <title>Synopsis</title>
      <codeblock id="sql_command_synopsis">EXPLAIN [ ( <varname>option</varname> [, ...] ) ] <varname>statement</varname>
EXPLAIN [ANALYZE] [VERBOSE] <varname>statement</varname></codeblock>
      <p>where <varname>option</varname> can be one
        of:<codeblock>    ANALYZE [ <varname>boolean</varname> ]
    VERBOSE [ <varname>boolean</varname> ]
    COSTS [ <varname>boolean</varname> ]
    BUFFERS [ <varname>boolean</varname> ]
    TIMING [ <varname>boolean</varname> ]
    FORMAT { TEXT | XML | JSON | YAML }</codeblock></p>
    </section>
    <section id="section3">
      <title>Description</title>
      <p><codeph>EXPLAIN</codeph> displays the query plan that the Greenplum or Postgres Planner
        generates for the supplied statement. Query plans are a tree plan of nodes. Each node in the
        plan represents a single operation, such as table scan, join, aggregation or a sort.</p>
      <p>Plans should be read from the bottom up as each node feeds rows into the node directly
        above it. The bottom nodes of a plan are usually table scan operations (sequential, index or
        bitmap index scans). If the query requires joins, aggregations, or sorts (or other
        operations on the raw rows) then there will be additional nodes above the scan nodes to
        perform these operations. The topmost plan nodes are usually the Greenplum Database motion
        nodes (redistribute, explicit redistribute, broadcast, or gather motions). These are the
        operations responsible for moving rows between the segment instances during query
        processing.</p>
      <p>The output of <codeph>EXPLAIN</codeph> has one line for each node in the plan tree, showing
        the basic node type plus the following cost estimates that the planner made for the
        execution of that plan node:</p>
      <ul>
        <li id="dq155475"><b>cost</b> — the planner's guess at how long it will take to run the
          statement (measured in cost units that are arbitrary, but conventionally
          mean disk page fetches). Two cost numbers
          are shown: the start-up cost before the first row can be returned, and
          the total cost to return all the rows. Note that the total cost assumes that all
          rows will be retrieved, which may not always be the case (if using <codeph>LIMIT</codeph>
          for example).</li>
        <li id="dq155476"><b>rows</b> — the total number of rows output by this plan node. This is
          usually less than the actual number of rows processed or scanned by the plan node,
          reflecting the estimated selectivity of any <codeph>WHERE</codeph> clause conditions.
          Ideally the top-level nodes estimate will approximate the number of rows actually
          returned, updated, or deleted by the query.</li>
        <li id="dq155477"><b>width</b> — total bytes of all the rows output by this plan node.</li>
      </ul>
      <p>It is important to note that the cost of an upper-level node includes the cost of all its
        child nodes. The topmost node of the plan has the estimated total execution cost for the
        plan. This is this number that the planner seeks to minimize. It is also important to
        realize that the cost only reflects things that the query optimizer cares about. In
        particular, the cost does not consider the time spent transmitting result rows to the
        client.</p>
      <p><codeph>EXPLAIN ANALYZE</codeph> causes the statement to be actually run, not only
        planned. The <codeph>EXPLAIN ANALYZE</codeph> plan shows the actual results along with the
        planner's estimates. This is useful for seeing whether the planner's estimates are close to
        reality. In addition to the information shown in the <codeph>EXPLAIN</codeph> plan,
          <codeph>EXPLAIN ANALYZE</codeph> will show the following additional information:</p>
      <ul>
        <li id="dq155498">The total elapsed time (in milliseconds) that it took to run the
          query.</li>
        <li id="dq155499">The number of <i>workers</i> (segments) involved in a plan node operation.
          Only segments that return rows are counted.</li>
        <li id="dq155500">The maximum number of rows returned by the segment that produced the most
          rows for an operation. If multiple segments produce an equal number of rows, the one with
          the longest <i>time to end</i> is the one chosen.</li>
        <li id="dq155501">The segment id number of the segment that produced the most rows for an
          operation.</li>
        <li id="dq155554">For relevant operations, the <varname>work_mem</varname> used by the
          operation. If <codeph>work_mem</codeph> was not sufficient to perform the operation in
          memory, the plan will show how much data was spilled to disk and how many passes over the
          data were required for the lowest performing segment. For
          example:<codeblock>Work_mem used: 64K bytes avg, 64K bytes max (seg0).
Work_mem wanted: 90K bytes avg, 90K bytes max (seg0) to abate workfile 
I/O affecting 2 workers.
[seg0] pass 0: 488 groups made from 488 rows; 263 rows written to 
workfile
[seg0] pass 1: 263 groups made from 263 rows</codeblock></li>
        <li id="dq155502">The time (in milliseconds) it took to retrieve the first row from the
          segment that produced the most rows, and the total time taken to retrieve all rows from
          that segment. The <i>&lt;time&gt; to first row</i> may be omitted if it is the same as the
            <i>&lt;time&gt; to end</i>.</li>
      </ul>
      <note type="important">Keep in mind that the statement is actually run when
          <codeph>ANALYZE</codeph> is used. Although <codeph>EXPLAIN ANALYZE</codeph> will discard
        any output that a <codeph>SELECT</codeph> would return, other side effects of the statement
        will happen as usual. If you wish to use <codeph>EXPLAIN ANALYZE</codeph> on a DML statement
        without letting the command affect your data, use this
        approach:<codeblock>BEGIN;
EXPLAIN ANALYZE ...;
ROLLBACK;</codeblock></note>
      <p>Only the <codeph>ANALYZE</codeph> and <codeph>VERBOSE</codeph> options can be specified,
        and only in that order, without surrounding the option list in parentheses.</p>
    </section>
    <section id="section4">
      <title>Parameters</title>
      <parml>
        <plentry>
          <pt>ANALYZE</pt>
          <pd>Carry out the command and show the actual run times and other statistics. This
            parameter defaults to <codeph>FALSE</codeph> if you omit it; specify <codeph>ANALYZE
              true</codeph> to enable it. </pd>
        </plentry>
        <plentry>
          <pt>VERBOSE</pt>
          <pd>Display additional information regarding the plan. Specifically, include the output
            column list for each node in the plan tree, schema-qualify table and function names,
            always label variables in expressions with their range table alias, and always print the
            name of each trigger for which statistics are displayed. This parameter defaults to
              <codeph>FALSE</codeph>if you omit it; specify <codeph>VERBOSE true</codeph> to enable
            it.</pd>
        </plentry>
        <plentry>
          <pt>COSTS</pt>
          <pd>Include information on the estimated startup and total cost of each plan node, as well
            as the estimated number of rows and the estimated width of each row. This parameter
            defaults to <codeph>TRUE</codeph> if you omit it; specify <codeph>COSTS false</codeph>
            to disable it.</pd>
        </plentry>
        <plentry>
          <pt>BUFFERS</pt>
          <pd>Include information on buffer usage. Specifically, include the number of shared blocks
            hit, read, dirtied, and written, the number of local blocks hit, read, dirtied, and
            written, and the number of temp blocks read and written. A <i>hit</i> means that a read
            was avoided because the block was found already in cache when needed. Shared blocks
            contain data from regular tables and indexes; local blocks contain data from temporary
            tables and indexes; while temp blocks contain short-term working data used in sorts,
            hashes, Materialize plan nodes, and similar cases. The number of blocks <i>dirtied</i>
            indicates the number of previously unmodified blocks that were changed by this query;
            while the number of blocks <i>written</i> indicates the number of previously-dirtied
            blocks evicted from cache by this backend during query processing. The number of blocks
            shown for an upper-level node includes those used by all its child nodes. In text
            format, only non-zero values are printed. This parameter may only be used when
              <codeph>ANALYZE</codeph> is also enabled. This parameter defaults to <codeph>FALSE</codeph>
            if you omit it; specify <codeph>BUFFERS true</codeph> to enable it.</pd>
        </plentry>
        <plentry>
          <pt>TIMING</pt>
          <pd>Include actual startup time and time spent in each node in the output. The overhead of
            repeatedly reading the system clock can slow down the query significantly on some
            systems, so it may be useful to set this parameter to <codeph>FALSE</codeph> when only
            actual row counts, and not exact times, are needed. Run time of the entire statement is
            always measured, even when node-level timing is turned off with this option. This
            parameter may only be used when <codeph>ANALYZE</codeph> is also enabled. It defaults to
              <codeph>TRUE</codeph>. </pd>
        </plentry>
        <plentry>
          <pt>FORMAT</pt>
          <pd>Specify the output format, which can be <codeph>TEXT</codeph>, <codeph>XML</codeph>,
              <codeph>JSON</codeph>, or <codeph>YAML</codeph>. Non-text output contains the same
            information as the text output format, but is easier for programs to parse. This
            parameter defaults to <codeph>TEXT</codeph>.</pd>
        </plentry>
        <plentry>
          <pt><varname>boolean</varname></pt>
          <pd>Specifies whether the selected option should be turned on or off. You can write
              <codeph>TRUE</codeph>, <codeph>ON</codeph>, or <codeph>1</codeph> to enable the
            option, and <codeph>FALSE</codeph>, <codeph>OFF</codeph>, or <codeph>0</codeph> to
            disable it. The boolean value can also be omitted, in which case <codeph>TRUE</codeph>
            is assumed.</pd>
        </plentry>
        <plentry>
          <pt><varname>statement</varname></pt>
          <pd>Any <codeph>SELECT</codeph>, <codeph>INSERT</codeph>, <codeph>UPDATE</codeph>,
              <codeph>DELETE</codeph>, <codeph>VALUES</codeph>, <codeph>EXECUTE</codeph>,
              <codeph>DECLARE</codeph>, or <codeph>CREATE TABLE AS</codeph> statement, whose
            execution plan you wish to see.</pd>
        </plentry>
      </parml>
    </section>
    <section id="section5"><title>Notes</title><p>In order to allow the query optimizer to make
        reasonably informed decisions when optimizing queries, the <codeph>ANALYZE</codeph>
        statement should be run to record statistics about the distribution of data within the
        table. If you have not done this (or if the statistical distribution of the data in the
        table has changed significantly since the last time <codeph>ANALYZE</codeph> was run), the
        estimated costs are unlikely to conform to the real properties of the query, and
        consequently an inferior query plan may be chosen.</p>An SQL statement that is run during
      the execution of an <codeph>EXPLAIN ANALYZE</codeph> command is excluded from Greenplum
      Database resource queues.<p>For more information about query profiling, see "Query Profiling"
        in the <cite>Greenplum Database Administrator Guide</cite>. For more information about
        resource queues, see "Resource Management with Resource Queues" in the <cite>Greenplum
          Database Administrator Guide</cite>.</p></section>
    <section id="section6">
      <title>Examples</title>
      <p>To illustrate how to read an <codeph>EXPLAIN</codeph> query plan, consider the following
        example for a very simple query:</p>
      <codeblock>EXPLAIN SELECT * FROM names WHERE name = 'Joelle';
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)  (cost=0.00..431.27 rows=1 width=58)
   ->  Seq Scan on names  (cost=0.00..431.27 rows=1 width=58)
         Filter: (name = 'Joelle'::text)
 Optimizer: Pivotal Optimizer (GPORCA) version 3.23.0
(4 rows)</codeblock>

      <p>If we read the plan from the bottom up, the query optimizer starts by doing a sequential
        scan of the <codeph>names</codeph> table. Notice that the <codeph>WHERE</codeph> clause is
        being applied as a <i>filter</i> condition. This means that the scan operation checks the
        condition for each row it scans, and outputs only the ones that pass the condition. </p>
      <p>The results of the scan operation are passed up to a <i>gather motion</i> operation. In
        Greenplum Database, a gather motion is when segments send rows up to the master. In this
        case we have 3 segment instances sending to 1 master instance (3:1). This operation is
        working on <codeph>slice1</codeph> of the parallel query execution plan. In Greenplum
        Database a query plan is divided into <i>slices</i> so that portions of the query plan can
        be worked on in parallel by the segments.</p>
      <p>The estimated startup cost for this plan is <codeph>00.00</codeph> (no cost) and a total
        cost of <codeph>431.27</codeph>. The planner is estimating that this query will return one
        row.</p>
      <p>Here is the same query, with cost estimates suppressed:</p>
      <codeblock>EXPLAIN (COSTS FALSE) SELECT * FROM names WHERE name = 'Joelle';
                QUERY PLAN
------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   ->  Seq Scan on names
         Filter: (name = 'Joelle'::text)
 Optimizer: Pivotal Optimizer (GPORCA) version 3.23.0
(4 rows)</codeblock> 

      <p>Here is the same query, with JSON formatting:</p>
      <codeblock>EXPLAIN (FORMAT JSON) SELECT * FROM names WHERE name = 'Joelle';
                  QUERY PLAN
-----------------------------------------------
 [                                            +
   {                                          +
     "Plan": {                                +
       "Node Type": "Gather Motion",          +
       "Senders": 3,                          +
       "Receivers": 1,                        +
       "Slice": 1,                            +
       "Segments": 3,                         +
       "Gang Type": "primary reader",         +
       "Startup Cost": 0.00,                  +
       "Total Cost": 431.27,                  +
       "Plan Rows": 1,                        +
       "Plan Width": 58,                      +
       "Plans": [                             +
         {                                    +
           "Node Type": "Seq Scan",           +
           "Parent Relationship": "Outer",    +
           "Slice": 1,                        +
           "Segments": 3,                     +
           "Gang Type": "primary reader",     +
           "Relation Name": "names",          +
           "Alias": "names",                  +
           "Startup Cost": 0.00,              +
           "Total Cost": 431.27,              +
           "Plan Rows": 1,                    +
           "Plan Width": 58,                  +
           "Filter": "(name = 'Joelle'::text)"+
         }                                    +
       ]                                      +
     },                                       +
     "Settings": {                            +
       "Optimizer": "Pivotal Optimizer (GPORCA) version 3.23.0"      +
     }                                        +
   }                                          +
 ]
(1 row)</codeblock>
      <p>If there is an index and we use a query with an indexable <codeph>WHERE</codeph> condition,
          <codeph>EXPLAIN</codeph> might show a different plan. This query generates a plan with an
        index scan, with YAML
        formatting:<codeblock>EXPLAIN (FORMAT YAML) SELECT * FROM NAMES WHERE LOCATION='Sydney, Australia';
                          QUERY PLAN
--------------------------------------------------------------
 - Plan:                                                     +
     Node Type: "Gather Motion"                              +
     Senders: 3                                              +
     Receivers: 1                                            +
     Slice: 1                                                +
     Segments: 3                                             +
     Gang Type: "primary reader"                             +
     Startup Cost: 0.00                                      +
     Total Cost: 10.81                                       +
     Plan Rows: 10000                                        +
     Plan Width: 70                                          +
     Plans:                                                  +
       - Node Type: "Index Scan"                             +
         Parent Relationship: "Outer"                        +
         Slice: 1                                            +
         Segments: 3                                         +
         Gang Type: "primary reader"                         +
         Scan Direction: "Forward"                           +
         Index Name: "names_idx_loc"                         +
         Relation Name: "names"                              +
         Alias: "names"                                      +
         Startup Cost: 0.00                                  +
         Total Cost: 7.77                                    +
         Plan Rows: 10000                                    +
         Plan Width: 70                                      +
         Index Cond: "(location = 'Sydney, Australia'::text)"+
   Settings:                                                 +
     Optimizer: "Pivotal Optimizer (GPORCA) version 3.23.0"
(1 row)</codeblock></p>    </section>
    <section id="section7">
      <title>Compatibility</title>
      <p>There is no <codeph>EXPLAIN</codeph> statement defined in the SQL standard.</p>
    </section>
    <section id="section8">
      <title>See Also</title>
      <p><codeph><xref href="ANALYZE.xml#topic1" type="topic" format="dita"/></codeph></p>
    </section>
  </body>
</topic>
